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Abstract — The blogosphere can be construed as a knowledge 
network made of bloggers who are interacting through a so- 
cial network to share, exchange or produce information. We 
claim that the social and semantic dimensions are essentially 
co-determined and propose to investigate the co-evolutionary 
dynamics of the blogosphere by examining two intertwined issues: 
first, how does knowledge distribution drive new interactions and 
thus influence the social network topology? Second, which role 
structural network properties play in the information circulation 
in the system? We adopt an empirical standpoint by analyzing 
the semantic and social activity of a portion of the US political 
blogosphere, monitored on a period of four months. 

I. The "blogosphere" as a 

SOCIO-SEMANTIC SYSTEM 

The blogosphere essentially gathers individuals who share, 
exchange and produce information and interact online by 
posting comments or referencing each other. As such, it is 
a socio-semantic network, in the sense that each blog can be 
characterized both by a relational profile, determined by its 
position in the underlying social network, and by a semantic 
profile, which describes cognitive attributes. 

Adopting a dual perspective on these knowledge networks 
is likely to provide a better knowledge of the key mecha- 
nisms underlying their organization and evolution: essentially, 
both dimensions co-evolve, for instance network dynamics is 
likely to be affected by the distribution of knowledge, if we 
assume that semantic homophily is a driving force behind 
network evolution. Structural features of the implicit social 
network may also give rise to some specific patterns regarding 
knowledge distribution. Put differently, by supporting diffusion 
processes social networks may diversely affect information 
circulation among bloggers. 

We propose to investigate empirically the coevolutionary 
dynamics of a portion of the blogosphere by examining the 
two intertwined following issues: 

(i) how does knowledge distribution influence new relation- 
ship appearance, thereby influencing the topology? 

(ii) how, in turn, do structural network properties play a role 
in the way information circulates in the system? 

Related work 

Blogs attracted much attention as an empirical goldmine 
for quantitative social science and, more theoretically, as a 



rich instance of social and semantic complex system 
i; S; fi @; 0; 0). This recent effort is part of a broader 
interest in online knowledge-based networks, including for 
instance wikis (9) or content- sharing websites such as Flickr 
dlOl) . which fundamentally are virtual spaces dedicated to 
production, sharing, and circulation of opinion, multimedia 
resource and more broadly information; and where various 
kinds of social interactions and collaborations are channeled 
by so-called "web 2.0" technologies. 

Political blogging itself is also the focus of a decent part of 
the literature in that it allows investigation of multiple current 
issues, including influence of bloggers over media coverage or 
over the general political debate (Hi : 



U). 



12|; U3| 

Many of these studies focus on the blogosphere or 
blogspace with a social network perspective, aiming at mea- 
suring and characterizing topological properties including 
link configurations, cohesiveness phenomena and existence 
of groups or communities Jl5l: flft \vh . Beyond a strictly 
structural approach, static descriptions of the joint distribution 
of topics and social configuration of a blogosphere has been 
achieved by (fill) : however the dynamic interrelations of these 
two dimensions remains a current problem. 
Further, studies considering the blogosphere as an informa- 
tional system have mainly focused on investigating topic and 
opinion evolution — thanks to the fine-grained dynamics of 
the underlying data — thereby developing automatic trend 
detection methods characterizing opinion dynamics us- 
ing sentiment analysis or exploring the coexistence of 
chatters and spikes in blog conversations, and their cyclic 
behaviors u9l l20l; l2ll), inter alia. 



Blogs and more generally Internet-based communication 
systems have provided a novel opportunity for diffusion 
studies, through the in-vivo observation of what is generally 
refered to as "cascade dynamics" (1221; l23l) . 
This feature is common notably to viral marketing studies 
on large-scale online datasets exhibiting diffusion phenomena; 
including (|6|) which explores the distribution of the probability 
of purchasing a cultural consumer good when a large on-line 
retailer user receives a certain number of recommendations 
sent by her friends; and (1241) computes the probability for 
one to join a Livejournal community when she already has 
some friends in it. More specifically cascades in blog networks 
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have been extensively described by considering chains of posts 
citing each other as information pathways (BHHHl)- In these 
cases as well as in other studies not restrained to blog networks 
d26h the focus has been put on influence spread through the 
study of the topological properties of cascades (such as typical 
patterns of cascade, distribution of cascade sizes, etc.) 

Eventually, little is known yet on the dynamic underpinnings 
of content distribution over agents with respect to topology 
and on the processes underlying the actual formation of 
heterogeneous topical communities; or, more broadly, on the 
very intertwining of social and semantic dimensions and their 
effect on information propagation, with the notable exception 
of CI). 

Most often, one only of the social and the semantic dimensions 
is considered. On one hand indeed, link creation patterns are 
generally essentially appraised through structural attributes 
rather than cognitive/semantic properties of blogs. As for 
diffusion, either content evolution is studied independently 
of the topology, or topology is the only reference frame for 
diffusion (one observes the propagation of links of the social 
network along the social network — i.e. some sort of structural 
transitivity). On the other hand, endeavors at understanding 
what triggers or increases diffusion have given a prevailing 
role to ego-centered characterizations (i.e. diffusion is often 
seen as stemming from individual properties, rather than the 
shape of the network at large). 

Put shortly, in terms of diffusion, taking into account both 
the network structure and a transmission process on objects 
distinct of this structure is so far a current challenge. In this 
respect, we also aim at assessing how actual content diffusion 
pathways can be correlated with the (mostly distinct) underly- 
ing social network that supports such information circulation. 

Outline 

The paper is organized as follows: in the next section we 
first introduce the empirical protocol. Section (TTTJ focuses on 
the dynamics of link creation in the comment and post net- 
works according to both structural and semantic features, while 
Sec. HVl investigates the dynamics of information propagation 
according to the underlying topology. 

II. Experimental framework 

A. A bounded subset of the US blogosphere 

Our study is based on the observation of the activity of a 
medium- sized yet topically well-bounded portion of the US 
political blogosphere which has been gathered by LlNKFLU- 
ENCE under the "PresidentialWatch08" projectQ 

The dataset consists of 1, 066 blogs, hereafter denoted by B, 
monitored over the course of four months, from Nov 1, 2007 
to Feb 29, 2008. For each blogger we crawled the date and 
full-text content, including hyperlinks, of each post published 
during the observation period, totaling 71,376 posts. 



B. A dynamic network 

The couple (B, C) is the blog network, where C denotes 
post citation links as an adjacency matrix of size |B| x |B|. 
This data is additionally dynamic, with a temporal granularity 
of one day: we deal with Cu where t ranges from 1 to 121: 
Ct(i,j) = 1 if i cites j in a post at time t; otherwise. In 
the remainder, t may be omitted in the notations when it is 
implicit. 

We extracted 229, 736 dated edges in C, of which 15, 032 
are unique (non-repeated links). We eventually define an 
aggregated weighted network as C t = J2t'=i Cf- 

C. An epistemic network 

Aside of this structure, content defines a semantic di- 
mension: posts are traditionally dealing with specific issues, 
sometimes broadcasting particular documents. Although the 
existence of a clear-cut distinction between high-level topics 
and specific cultural items may be debated, we assume that 
(i) textual contents broadly define the various issues a blogger 
addresses, whereas (ii) explicit URLs (which refer to hyper- 
linked documents and which are not citation links) define the 
various specific digital resources a blogger spreads around. 

Subsequently, we distinguish: 

• a set of high-level topics W relevantly linked to political 
commentary in our context, among the most frequent in 
the corpus (thus excluding rhetorical terms). W is thus 
made of 79 syntagms ranging from names of politicians to 
issues which kept the blogosphere busy during the pres- 
idential campaign, such as "climate change", "national 
security", "super Tuesday", "tax cuts", "human rights", 
etc. 

• and a set of URLs, noted U, which are not conf usable 
with a link in the citation network — these are simply 
online videos, news media article, etc. U is a selection 
of 96, 637 URLs (of length larger than 10 characters). 
Note that these URLs are taken from the limited content 
of posts only, not webpages, so that U should exclude 
banners and platform-related links and ads, inter alia; 
it only covers links explicitly cited by bloggers in their 
posts. 

More precisely with respect to W, we introduce a temporal 
matrix W t which tracks the contents published by bloggers: 
W t (i,w) equals 1 if term w G W appears in a post pub- 
lished on blog i at time t, otherwise. Eventually, the |W|- 
dimensional vector W t (i) defined as a the sum of rows W t > (i) 
for t' < t denotes the aggregation of all topics addressed by 
blog i until t. 

W £ (i) can be seen as the semantic profile of i at t. 

In a similar fashion for U, we introduce a temporal matrix * 
such that t(i,u) equals 1 if blog i explicitly refers to a URL 
u G U in a post published at time t. Since this matrix will 
mostly be used for diffusion purposes, we need not define in 
the present study an aggregated quantity for URL usage. 



1 http://linkfluence.net, http://presidentialwatch08.com 



III. Evolution of topology: 



The content-based dynamics of link creation 

We first study the link creation dynamics with respect 
to the configuration of the blogosphere, both on a social 
and semantic level. In particular, we examine the constraint 
induced by the current socio- semantic network on future 
citation patterns. The structure of both social and semantic 
configurations may, at least partially, determine link creations. 
Quite straightforwardly, remoteness in both spaces is likely to 
modify the landscape of potential relations and, subsequently, 
modify the likelihood of interaction. In what follows we focus 
notably on citation propensity with respect to simple notions 
of topological as well as semantic distances: how do proximity, 
increased attention or homophily processes actually impact 
authority attribution in this portion of the blogosphere? 

A. Proximity and distance 

To begin with, we define a series of simple distances which 
are all based on aggregated data at t, denoted by "bold" 
notations (C and W); that is, we assume each notion of 
distance between two blogs to depend on the whole history of 
posting and linking at t. 

1) Dissimilarity as a semantic distance: To semantically 
compare a pair of bloggers i and j at t, we adopt a classical 
cosine-based measure of dissimilarity on their profile vectors 
W t (i) and W t (j). We denote this semantic distance by 5: 
concretely, identical profiles yield a S of 0, whereas strictly dis- 
joint/orthogonal profiles are separated by a 6 of 1; intermediate 
values from to 1 indicate increasing levels of dissimilarity!! 

2) Topological distances: Because network links are ori- 
ented, topological distances will be asymmetric measures, 
contrarily to the semantic distance S. 

We first classically define the social distance d t (i,j) be- 
tween two blogs i and j in C as the length of the shortest path 
linking i to j in that network, irrespective of link weights. This 
basically refers to the number of steps one has to follow to 
reach another blog. On the example of Fig.QJleft, dt(b, /) = 3. 

2 To this end, we first need to carry a normalization procedure to weight 
term occurrences properly, following the "tf-idf ' canonical approach used 
extensively in information retrieval, famously introduced in the vector-space 
model (Tn). This approach more precisely consists in weighting the "term 
frequency", "tf" (so that most used terms in a given blog are more important) 
with the so-called "inverse document frequency", "idf", or frequency of the 
term in the whole corpus of blogs (so that rarer terms in the blogosphere 
are weighted more: this takes into account the discriminating power of terms 
which, while usually rare in the corpus, are being abnormally mentioned by 
a given blog). 

For this computation, profiles Wt(i) are thus actually replaced by tf idf- 
adjusted profiles W* (i) such that: 



Wt(i, w) : 



Wt(i, w) 
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where the "log" part of the formula is the inverse ratio of the number of blogs 
where term w appears over the total number of blogs. Then, we obtain the 
dissimilarity between blogs i and j by dividing the scalar product of their 
adjusted profiles by the product of their norm: 



l|Wt(i)||||WtO')ll 






Attention a 



Detachment D 



Fig. 1. Left: An example of weighted citation network C$: weights trivially 
correspond to the number of observed links between blogs at some time t. 
Middle and right: corresponding attention and detachment values, respectively. 
For example, blog b cited c twice out of a total of 1 + 2 + 3 = 6 citation 
links, its attention toward c is thus at (6, c) = | . Detachment Dt(b,c) equals 
the inverse of the attention from b to c, it is 3. Detachment-based distance 
dt (b, c) is also 3 (since b — c is the shortest weighted path from b to c), while 
d t (b, e) = 2 + 5/3 = 11/3. 



Since influence effects relate to attentional features, we 
suggest that a notion of remoteness based on "attention" may 
also be relevant. In this respect, we define a dyadic attention 
a^ by normalizing every row of C t : 



EgiCt(i,j) 



(i) 



8Lt(hj) is thus simply the proportion of links going from i 
to j among all outgoing links from i. Higher values indicate 
higher focus by i on j. Note that a similar notion is called 
"influence matrix" in (1151) . 

Now, we can define an opposite notion to attention by con- 
sidering inverse values of a, defining a measure of detachment 

as j) = — -. In other words, T)(i,j) can be compared 

with a relative cost for information to reach i directly from 
j. It is equal to infinity if there is no link from i to j, it is 
decreasing when attention of i towards j is growing. Basically, 
for instance, if i has three times more links towards j than 
towards k, then z's detachment to j is three times lower. 

Eventually, we define a detachment-based distance as the 
minimal weighted distance (1281) in a weighted graph ® where 
link weights from i to j are non-infinite values. We 

denote this detachment-based distance <9(i, j) — as such, it 
can be considered as a measure of attentional remoteness, 
i.e. lightweight attentional paths will correspond to higher 
detachment-based distances. See an illustration on Fig. [T] 

B. Method for appraising preferential link creation 

While sophisticated regression models have been developed 
in mathematical social science to measure the preference of 



link creation (1291) . we stick here to a basic yet insightful 
framework for comparing (i) the number of links actually 
received by some kinds of nodes during a period of time, 
with (ii) the potential number of such links — i.e. a kind of 
"preferential attachment" measurement l30|) . here with respect 
to any kind of property (13 1[) . 

More precisely, we define f(x) as the propensity of forma- 
tion of new citation links such that their social distance 
is d(i,j) = x. Put simply, higher propensity values indicate 
stronger likelihood for dyads at a certain distance to form, all 
other things being equal. 

We concretely compute the propensity f(x) as the propor- 
tion of new links appearing in C during a given time period 
[t + 1, t + T] and which were at social distance x at t, among 
the whole set of possible such pairs at distance x: 



such that C t+T (i, j) > C t (iJ) II 
and d(i,j) = x J I 



(2) 



such that d(ij) = x}\ 

Empirically, we estimate various propensities for a series of 
time steps + + T] such that t k = 60 + kT,T = 

7} — basically estimating the propensity at a weekly 

J fce{o,...,7} 

rate, given all previous observed interactions, with the excep- 
tion that we start the computation only after an initialization 
period of two months (to = 60). 

Propensities with respect to the social, detachment-based 
and semantic distances are respectively denoted /, f d and g. 
In the figures, all propensities are normalized for comparison 
purposes. 

C. Linking and social distance 

We first analyze the effect of topological distances on new 
citation creation by using the plain social distance d and the 
detachment-based distance d. Figure [2] depicts the results for 
the social-distance-based propension /; the trend for f d is 
essentially similar, although not depicted here due to length 
constraints. On the whole, both propensity profiles are strongly 
and generally exponentially decreasing with higher distances, 
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Fig. 3. Semantic distance distributions. Triangles: distribution computed over 
the whole set of possible pairs of blogs. Crosses: distribution computed on 
pairs of blogs actually linked in the citation network C. 



reflecting the effect on link creation likelihood of structural/to- 
pological and attentional remoteness: link creation basically 
occurs in the topological neighborhood, often not much farther 
than a couple of clicks away. Interestingly, above a certain 
threshold propensities stop decreasing: in other words, below 
a certain level of closeness, all bloggers are equally remote. 

Specifically in terms of social distance, propensities are 
about at least one order of magnitude larger at distance 1 
than other distances, for all networks. Links at distance 1 are 
actually repeated links, indicating that most relationships, by 
large, tend to occur between already connected bloggers; then, 
secondarily, towards friends of friends. Rather than speaking 
of a "small- world", in this case, one would rather talk of a 
"narrow- world" (I32h . When new links are established outside 
this close circle, the propensity to cite decreases particularly 
steeply with respect to social distance. Eventually propensities 
relative to detachment-based distances, while indicative of 
weighted attention-related processes, still mostly exhibit the 
same behavior and confirm these topogical effects. 



Fig. 2. Propensity / for new post citation in C as a function of social 
distance d. Error bars indicate 95% -confidence intervals on means. 



D. Linking and semantic distance 

Topology thus self-influences topology, yet content distri- 
bution ma y ad mittedly play a role in further shaping network 
structure; (II lb demonstrated for instance how partisan divides 
corresponded to structural ones in the political blogosphere 
prior to 2004 US elections. 

Here, we can first appraise homophily statically, or a 
posteriori, by observing the configuration of links already 
present at t. We therefore measure the semantic distance 6 
between blogs, distinguishing the whole blogosphere from the 
immediate neighborhood of blogs. We observe on Fig. [3] that 
the immediate neighborhood is very significantly closer se- 
mantically, when compared with the overall semantic distance 
between pairs of blogs of the whole set B, indicating a very 
strong a posteriori homophily. 

This fact suggests a strong homophilic behavior in link 
creation itself; in other words, it indicates a dynamic, or a 
priori, homophily. To check this, we compute propensities for 
link creation with respect to the semantic distance. The results 
are plotted on Fig. H] and clearly confirm the above hypothesis. 
For instance, blogs at a semantic distance less than .2 will have 
a likeliness to cite each other about 10 times higher than blogs 



at an average semantic distance and 100 times than couple of 
blogs strongly differing semantically. 

Topological coevolution: To appraise how the social and 
semantic effects mix together, we finally compute propensities 
in a two-variable setting based on both social and semantic 
distances, as shown on Fig. [5] The main conclusion is that, 
outside of the close circle of repeated citations (d = 1), the 
above-mentioned homophilic behavior has a sensible effect, 
even stronger with increasing social distances. In the case 
of neighbors however (i.e. repeated citations), the semantic 
distance has a mixed role. Citations are indeed more likely 
towards very similar blogs, again (5 £ [0; 0.2 [), yet, it is also 
more and even much more likely towards very dissimilar blogs 
(S £ [0.8; 1]). 

IV. Evolution of Content: 
The Topology-based Dynamics 
of Diffusion 

Topology thus evolves with respect to content distribution. 
Yet, in a dual manner, how does the dynamics of content 
circulation depend on topological features? To assess this, 
we first need to introduce a notion of diffusion subgraphs 
(Sec. IIV-AI) and some specific characteristics of the under- 
lying citation networks which may be likely to influence 
the diffusion phenomena, particularly attention-related features 
(Sec. HV-Bl) . 

A. Diffusion subgraph 

More precisely, we focus on explicit diffusion events, which 
correspond to simultaneously posting some content and refer- 
ring to another blog which already posted about this same 
content. We therefore define the notion of diffusion subgraph, 
which gathers every blog which mentioned a given URL in a 
post, and every directed link (i,j) between these blogs such 
that i simultaneously mentions the URL and refers to j which 
had already, previously, mentioned that URL. 
Technically, given a resource u £ U, we define a u the diffusion 
subgraph o/w as a pair of: 

• blogs mentioning u in a post, and 
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Fig. 4. Propensity for new post citation with respect to semantic distance S. 




semantic distance S 



Fig. 5. Two-dimensional propensity with respect to social and semantic 
distances. 




Fig. 6. Illustration of a diffusion subgraph a UQ - Date labels indicate the 
time when the origin blog both mentioned uo and did a post citation to the 
destination. 

• directed edges (i, j) of C such that i simultaneously both 
cited j and mentioned u, after j mentioned u. 
Formally, these transmission links are edges (i, j) such 
that C t (ijj) > Ct-i(i, j) (i.e. there is a new link in C t 
from i to j at t), t{i,u) = 1 (i mentions u at t) and 
3t' < t, ^'(j^u) = 1 (i.e. j had mentioned u strictly 
before t). 

We denote such subgraphs a u £ V(B) x V(B x B). 

We say that a diffusion subgraph is trivial if its edge set 
is empty, i.e. if the corresponding URL is not involved in 
any explicit diffusion event between two blogs. Of the 96, 637 
URLs of U, only 11,709 correspond to non-trivial diffusion 
subgraphs over the whole collection period. In the remainder, 
we only focus on these non-trivial subgraphs. 

Figure provides an illustration of a real, non-trivial dif- 
fusion subgraph, whose underlying post citation network has 
previously been illustrated on Fig. [I] In this case, a given URL 
uo is first mentioned in blog a. It is then mentioned by c on 
Feb 19, who cites a on the same day. It then "diffuses" to b 
both from a and c on the next day. Eventually, blog d mentions 
uo along with a reference to b on Feb 26. 

We plotted on Fig. [7] the size distributions of the 11,709 
non-trivial diffusion subgraphs. Sizes are sensibly heteroge- 
neous both in terms of nodes and links, with a large number of 
small subgraphs (this observation is consistent with the shape 
of the cascade size found in (6)). Most of these subgraphs 
(7, 016) consist of a unique transmission event — 2 blogs and 
one link — while there are 39, 540 transmission events, over 
a total of 229, 736 citation links, i.e. slightly more than one 




size of diffusion subgraphs 



Fig. 7. Size distributions of diffusion subgraphs, in terms of nodes (red 
squares) and links (blue triangles). 



sixth of post citations are also transmission links. 

B. Diffusion-driven topological features 

1) Total attention: A quite simple ego-centered measure 
likely to be relevant to study diffusion relates to the notion 
of total attention exerted by a blog j, defined as the sum of 
attentions exerted on all "attentive" blogs i: 

i 

On Fig. Q] the total attention a t (c) exerted by blog c 
aggregates attentions from blogs b, a, d, e and / towards c, it 
is equal to 2.45. 

2) Edge-range distance: In addition, we now need a notion 
of structural distance that captures a feeling of remoteness 
between nodes already connected, obviously because the study 
of explicit diffusion is based on blogs which explicitly link 
towards and are thereby connected to other blogs. To this end, 
we use the notion of edge range, which has been notably 
recently used in diffusion studies in i33|) and which had been 
initially defined in (I34|) for a link (i, j) as the distance between 
i and j if link (i, j) were removed. 

We extend this notion to the case of a graph weighted with 
detachment values. Formally, we define edge range r(i,j) of 
link in the weighted detachment-based graph D as the 
minimal weighted distance between i and j when link 
is removed. 

In other terms, it is the minimal sum of detachment values 
along the "best" indirect path from i to j; or, so to say, the 
minimal total attentional cost an information requires to travel 
from a blog j to i if the edge from i to j were removed. More 
simply, it is also the detachment-based distance in a graph 
where edge (i, j) has been removed. 
See an example of edge range calculation on Fig. [8] 




Fig. 8. Edge-range calculation: we compute for instance edge-range r(b, c). 
The link between from b to c is first removed before computing the minimal- 
cost path from b to c, using detachment values d computed on C. On this 
example, the path is (b — d — e — c) and we have r(b, c) = Note that 
paths with less steps such as (b — a — c) may happen to be actually more 
expansive (with a cost of 10 in this very case). 
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Fig. 9. Mean number of first (blue dots) and second (red dots) transmission 
links produced by initiating blogs depending on their total attention a. 
(NB: distributions are plotted using 8 quantiles of a values to accommodate 
for their sensibly heterogeneous spread). 



C. Information relaying and attention 

The likeliness of a blog to be influent, by inducing content 
diffusion, is often said to be directly related to the number 



of links which flows into it (I35I: I15I) — influential bloggers 
being those who have more incoming links or those who have 
the largest audience. Following this standpoint, one can check 
the influence of ego by examining how ego-centered measures 
correlate with actual diffusion. 

As a first step, we check the correlation between the total 
attention of a blogger using a URL for a first time, and the 
transmission links s/he induces, i.e. as an originator in the 
corresponding diffusion subgraph. Figure [9] therefore depicts 
in blue the mean absolute number of such first transmission 



links provoked by blogs having a given total attention a0 

Higher total attention values are indeed correlated with a 
larger number of transmission events. In other words, more 
"influential" blogs seem basically and unsurprisingly to be 
those with larger active readership, broadly speaking. How- 
ever, influence appears to increase more than linearly for total 
attention values in the range of 5 • 10 -2 to 1, compared with 
total attentions below 5 • 10 -2 . This suggests that there is 
an accumulative benefit of having a larger total attention; 
however, this effect seems to be bounded as it vanishes for 
even higher values: above a certain threshold, the increase in 
influence is flatter, although still relatively increasing. On the 
whole, this "broken" shape suggests that the influence of an 
initiator, as measured by the number of first transmission links, 
is not a direct, linear result of attention. 

D. Information shortcuts and edge range 

Beyond underlining immediate readership effects, i.e. some- 
how emphasizing that information transmission through cita- 
tion is more frequent among regularly cited blogs, this kind 
of strictly ego-centered indicators is likely to provide little 
knowledge on a wider picture of information pathways; i.e. 
of propagation flows in terms of what makes an information 
propagate more broadly, in a wider arena. 

1) Second transmissions: To explore this, we choose to 
focus on "second transmissions" in diffusion subgraphs. In 
what precedes, we indeed exhibited that first transmissions 
were likely to be initiated by blogs having a large total atten- 
tion. First transmissions are relative to a given initial source 
— i.e. an initiator of a diffusion subgraph, who mentions a 
resource u without citing another blogger who mentioned u 
beforehand — while second transmissions are relative to a 
blog which already relays a resource. In other terms, it relates 
to the longevity of the diffusion phenomenon. Put simply, once 
a resource has been transmitted, how likely is it to pursue its 
way into the blogosphere? 

As can be infered from the red curve on Fig. |9l the 
effectiveness of second transmissions are determined by the 
attention of the initiator in roughly the same way as first 
transmissions were; in other words, attention does not inform 
us more on the longevity of the informational cascade. 

2 ) Weak ties and edge range: Rather, information spreading 
could depend on more holistic features related to the posi- 
tion of the pairs of individuals in the network: consistently 
with the vast amount of sociological literature on diffusion, 
information propagation could be more efficient along "weak 
ties" connecting remote areas of a network l36h . In this 
respect, we use edge range values as they provide a less local 
information than ego-centered attentional profiles. Higher edge 
range values are indeed typical of pairs of blogs which would 

3 Although not depicted here, we found similar correlations between the 
number of transmission links and the audience size in the broad sense, as 
measured by the number of incoming links. We nonetheless suggest total 
attention measures more precisely audience-related effects as it considers 
individual attentional landscapes, by weighting the number of links the 
referred blog receives with the relative importance it bears for the referring 
blogger. 




edge range r 

Fig. 10. Mean number of second transmission links (k,j) with respect to 
the edge range value r(j, i) of the first transmission. Data scarcity led us to 
bin r into five quintiles. 



otherwise be relatively far apart within the network, in terms 
of informational and attentional pathways, if the link between 
them were absent. As such, higher values loosely indicate 
weak ties (13 7|) . 

In particular, we examine the hypothesis that an information 
which has been channeled through a weak-tie as a "shortcut" 
may be more "contagious" for further diffusion. To test this 
hypothesis, we measure the number of transmission links in 
each diffusion subgraph with respect to the edge range of the 
edge from which the original resource was cited. In other 
words, if i is an initiator in subgraph <j n , j cites i for u, 
we then examine the number of blogs h in a u such that (/c, j) 
are edges of a u , with respect to r(j, i). 

The corresponding statistics, plotted on Fig. [lOl shows that 
resources which transited through edges of higher r generally 
tend to propagate to a greater number of blogs than for 
lower r. This is however valid below a certain threshold, after 
which links seem to be too weak to efficiently provoke second 
transmissions. As such weak ties, i.e. with higher edge range, 
proportionally act more as catalyzers for ongoing diffusions 
in that they connect otherwise relatively remote areas. 

To sum up, blogs (i) connected through a medium edge 
range to (ii) a "high attention" blog realize higher numbers of 
second transmissions. 



V. Conclusion 

Social and semantic dimensions are essentially co- 
determined in this blog network: first, both social and semantic 
topologies drive new interactions, specifically through a strong 
homophilic behavior and link creation within the structural 
neighborhood. Second, information circulation is shaped both 
by social and attentional topology, in a broad framework 
where influence in understood in relatively holistic terms. 
In particular, we showed how specific structural features 
may be associated to information pathways: while an ego- 
centered property such as total attention indicates a higher 
capacity to disseminate particular online resources, a non-ego- 
centered property such as edge range indicates that weaker 
links generally tend to bring richer diffusion in the longer term. 



Higher attention combined, later on, with higher edge range 
significantly enhance the capacities for an online resource to 
be further diffused. 

More broadly, we see this whole framework as a preliminary 
to a deeper understanding of the joint, coevolving dynamics 
of social and semantic structures, or the joint evolution of 
topology and information distribution, notably in the case 
where both dimensions evolve at comparable timescales. 
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